This report summarizes the results of the Month 7 multivariable marker superlearner modeling analysis of vaccine recipients for the HVTN 505 HIV vaccine efficacy trial. This analyses and report will be updated once the ELISpot Any Env and ADCP Mosaic markers are available.
Table 0.1 shows the 28 learner-screen combinations fed into the Superlearner. Table 2 shows the variable sets that were used as input feature sets in the Superlearning. The first variable set, baseline risk factors, is taken to be the same baseline factors adjusted for in the other correlates objectives of the SAP (RSA, Age, BMI and baseline risk score). For each set of Month 7 markers, both primary and exploratory markers are included. This is done given the objective of this machine learning analysis to be maximally inclusive and unbiased, including all of the primary and exploratory Month 7 immune markers. In addition, all Month 7 individual markers that are constituents for defining one or more of the 12 markers are included; for example the antigen-specificbreadth score variables aggregate over readouts to a set of antigens. Therefore, for example, the variable set “All BAMA IgG3 gp140 markers” in Table 2 includes all individual antigen IgG3 gp140 markers as well as inclding the IgG3 gp140 breadth score marker.
For each variable set, a point and 95% confidence interval estimate of CV-AUC from the superlearner model fit is used to summarize classification accuracy (Table 3 and Figure 1).
The Appendix section of the report shows the results (forest plots, ROC curves and predicted probability plots) for each of the 15 variable sets in order of their performance CV-AUC.
=200
| Learner | Screen |
|---|---|
| SL.mean | all |
| SL.bayesglm | all |
| SL.bayesglm | glmnet |
| SL.bayesglm | univar_logistic_pval |
| SL.bayesglm | highcor_random |
| SL.gam | glmnet |
| SL.gam | univar_logistic_pval |
| SL.gam | highcor_random |
| SL.glm | all |
| SL.glm | glmnet |
| SL.glm | univar_logistic_pval |
| SL.glm | highcor_random |
| SL.glm.interaction | all |
| SL.glm.interaction | glmnet |
| SL.glm.interaction | univar_logistic_pval |
| SL.glm.interaction | highcor_random |
| SL.glmnet.1 | all |
| SL.ksvm.polydot | glmnet |
| SL.ksvm.polydot | univar_logistic_pval |
| SL.ksvm.polydot | highcor_random |
| SL.ksvm.rbfdot | glmnet |
| SL.ksvm.rbfdot | univar_logistic_pval |
| SL.ksvm.rbfdot | highcor_random |
| SL.polymars | glmnet |
| SL.polymars | univar_logistic_pval |
| SL.polymars | highcor_random |
| SL.xgboost.4.no | all |
| SL.ranger.no | all |
| Variable Set Name | Variables included in the set |
|---|---|
| 1_baselineRiskFactors | Baseline risk factors only (Reference model) |
| 2_M7_ELISA | Baseline risk factors + M7 ELISA |
| 4_M7_ADCP | Baseline risk factors + M7 ADCP |
| 5_M7_IgG3 | Baseline risk factors + M7 IgG3 |
| 6_M7_IgG3gp140 | Baseline risk factors + M7 IgG3 gp140 |
| 7_M7_IgG3gp120 | Baseline risk factors + M7 IgG3 gp120 |
| 8_M7_IgG3V1V2 | Baseline risk factors + M7 IgG3 V1V2 |
| 9_M7_IgG3gp41 | Baseline risk factors + M7 IgG3 gp41 |
| 10_M7_IgG3bScores | Baseline risk factors + M7 IgG3 Breadth Scores |
| 11_M7_IgG3multi | Baseline risk factors + M7 IgG3 Multi-Epitope breadth |
| 12_M7_IgG3overall | Baseline risk factors + M7 Overall score across assays |
| 14_2+4 | Baseline risk factors + M7 ELISA + M7 ADCP |
| 15_2+5 | Baseline risk factors + M7 ELISA + M7 IgG3 |
| 18_4+5 | Baseline risk factors + M7 ADCP + M7 IgG3 |
| 22_2+4+5 | Baseline risk factors + M7 ELISA + M7 ADCP + M7 IgG3 |
| Variable set | CV-AUC (95% CI) |
|---|---|
| 1_baselineFactors | 0.584 [0.483, 0.685] |
| 10_M7_IgG3bScores | 0.524 [0.419, 0.629] |
| 4_M7_ADCP | 0.523 [0.419, 0.627] |
| 6_M7_IgG3gp140 | 0.521 [0.421, 0.622] |
| 5_M7_IgG3 | 0.518 [0.416, 0.620] |
| 12_M7_IgG3overall | 0.516 [0.412, 0.620] |
| 9_M7_IgG3gp41 | 0.510 [0.404, 0.615] |
| 8_M7_IgG3V1V2 | 0.508 [0.404, 0.612] |
| 18_4+5 | 0.505 [0.404, 0.606] |
| 2_M7_ELISA | 0.505 [0.400, 0.609] |
| 22_2+4+5 | 0.501 [0.399, 0.603] |
| 14_2+4 | 0.498 [0.391, 0.604] |
| 11_M7_IgG3multi | 0.495 [0.392, 0.598] |
| 15_2+5 | 0.495 [0.393, 0.597] |
| 7_M7_IgG3gp120 | 0.491 [0.386, 0.595] |
Figure 0.1: Forest plot showing Superlearner performance (weighted CV-AUC with 95% CI) across all 15 variable sets.
Forest plots, ROC curves and predicted probability plots are shown for each variable set.
Figure 1.1: Variable set ``1_baselineFactors’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.2: Variable set ``1_baselineFactors’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.3: Variable set ``1_baselineFactors’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.4: Variable set ``10_M7_IgG3bScores’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.5: Variable set ``10_M7_IgG3bScores’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.6: Variable set ``10_M7_IgG3bScores’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.7: Variable set ``4_M7_ADCP’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.8: Variable set ``4_M7_ADCP’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.9: Variable set ``4_M7_ADCP’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.10: Variable set ``6_M7_IgG3gp140’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.11: Variable set ``6_M7_IgG3gp140’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.12: Variable set ``6_M7_IgG3gp140’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.13: Variable set ``5_M7_IgG3’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.14: Variable set ``5_M7_IgG3’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.15: Variable set ``5_M7_IgG3’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.16: Variable set ``12_M7_IgG3overall’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.17: Variable set ``12_M7_IgG3overall’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.18: Variable set ``12_M7_IgG3overall’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.19: Variable set ``9_M7_IgG3gp41’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.20: Variable set ``9_M7_IgG3gp41’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.21: Variable set ``9_M7_IgG3gp41’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.22: Variable set ``8_M7_IgG3V1V2’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.23: Variable set ``8_M7_IgG3V1V2’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.24: Variable set ``8_M7_IgG3V1V2’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.25: Variable set ``18_4+5’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.26: Variable set ``18_4+5’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.27: Variable set ``18_4+5’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.28: Variable set ``2_M7_ELISA’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.29: Variable set ``2_M7_ELISA’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.30: Variable set ``2_M7_ELISA’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.31: Variable set ``22_2+4+5’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.32: Variable set ``22_2+4+5’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.33: Variable set ``22_2+4+5’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.34: Variable set ``14_2+4’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.35: Variable set ``14_2+4’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.36: Variable set ``14_2+4’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.37: Variable set ``11_M7_IgG3multi’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.38: Variable set ``11_M7_IgG3multi’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.39: Variable set ``11_M7_IgG3multi’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.40: Variable set ``15_2+5’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.41: Variable set ``15_2+5’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.42: Variable set ``15_2+5’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.
Figure 1.43: Variable set ``7_M7_IgG3gp120’’: Weighted CV-AUC (95% CI) of algorithms for predicting HIV disease status after Day 210
Figure 1.44: Variable set ``7_M7_IgG3gp120’’: Weighted CV-AUC ROC curves of top two individual learners along with Superlearner and discrete-SL.
Figure 1.45: Variable set ``7_M7_IgG3gp120’’: Weighted prediction probability plots of top two individual learners along with Superlearner and discrete-SL.